Lesson 7. Attribute and Spatial Joins

Now that we understand the logic of spatial relationship queries, let’s take a look at another fundamental spatial operation that relies on them.

This operation, called a spatial join, is the process by which we can leverage the spatial relationships between distinct datasets to merge their information into a new, synthetic dataset.

This operation can be thought as the spatial equivalent of an attribute join, in which multiple tabular datasets can be merged by aligning matching values in a common column that they both contain. Thus, we’ll start by developing an understanding of this operation first!


Instructor Notes

library(sf)
library(tmap)

7.0 Data Input and Prep

Let’s read in a table of data from the US Census’ 5-year American Community Survey (ACS5).

# Read in the ACS5 data for CA into an `sf` object.
# Note: We force the FIPS_11_digit to be read in as a string to preserve any leading zeroes.
acs5_df = read.csv("notebook_data/census/ACS5yr/census_variables_CA.csv")
head(acs5_df)
##                                            NAME c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4012, Alameda County, California   2456    1287     476     259      283
## 2 Census Tract 4013, Alameda County, California   3983     845    1348     827      796
## 3 Census Tract 4014, Alameda County, California   4340     713    1902     593      981
## 4 Census Tract 4015, Alameda County, California   2080     563    1064     215      190
## 5 Census Tract 4016, Alameda County, California   1889     324     960     247      274
## 6 Census Tract 4017, Alameda County, California   2544     553     634     302      945
##   c_race_moe c_white_moe c_black_moe c_asian_moe c_latinx_moe state_fips county_fips
## 1        213         191         116         124          195          6           1
## 2        680         186         411         283          236          6           1
## 3        644         314         440         198          620          6           1
## 4        369         222         283         116          105          6           1
## 5        400         135         376         164          149          6           1
## 6        314         140         259         144          247          6           1
##   tract_fips med_rent med_hhinc c_tenants c_owners c_renters med_rent_moe med_hhinc_moe
## 1     401200     1251     62064      1245      544       701          128         14598
## 2     401300      927     35030      1743      213      1530           64          6917
## 3     401400      884     28264      1447      274      1173           64          5802
## 4     401500      718     38684       992      449       543           80         12077
## 5     401600     1088     32663       707      140       567           79         16290
## 6     401700     1147     63052      1083      398       685          214         19070
##   c_tenants_moe c_owners_moe c_renters_moe c_movers c_stay c_movelocal c_movecounty
## 1            60           94            91     2448   1995         253          143
## 2            87           69           104     3978   2434        1114          252
## 3           106           75           125     4269   3448         699           76
## 4            82           96           114     2080   1750         211          112
## 5            88           61            90     1860   1545         148          153
## 6            95           85           116     2535   2001         350          116
##   c_movestate c_moveabroad c_movers_moe c_stay_moe c_movelocal_moe c_movecounty_moe
## 1          25           32          213        188             187               66
## 2          90           88          680        324             512              115
## 3          27           19          605        618             246               52
## 4           7            0          369        353             112              103
## 5           4           10          393        368              89              149
## 6          34           34          312        282             193               62
##   c_movestate_moe c_moveabroad_moe c_commute c_car c_carpool c_transit c_bike c_walk
## 1              39               31      1460   805        94       276    122     85
## 2              75               98      2046   698       223       801     37    214
## 3              34               26      1595   751        34       408    186    163
## 4              11               12       982   493        89       226     47     17
## 5               6               16       603   344        74       107     38      0
## 6              47               39      1585   648       284       373     21     29
##   c_commute_moe c_car_moe c_carpool_moe c_transit_moe c_bike_moe c_walk_moe year
## 1           191       144           109           105         60         43 2013
## 2           468       183           133           264         36        121 2013
## 3           396       211            28           179        108        151 2013
## 4           222       145            67            82         34         15 2013
## 5           170       128            74           120         44         12 2013
## 6           225       168           124           141         22         44 2013
##   FIPS_11_digit   p_white   p_black   p_asian   p_latinx  p_owners p_renters    p_stay
## 1    6001401200 0.5240228 0.1938111 0.1054560 0.11522801 0.4369478 0.5630522 0.8149510
## 2    6001401300 0.2121516 0.3384384 0.2076324 0.19984936 0.1222031 0.8777969 0.6118653
## 3    6001401400 0.1642857 0.4382488 0.1366359 0.22603687 0.1893573 0.8106427 0.8076833
## 4    6001401500 0.2706731 0.5115385 0.1033654 0.09134615 0.4526210 0.5473790 0.8413462
## 5    6001401600 0.1715193 0.5082054 0.1307570 0.14505029 0.1980198 0.8019802 0.8306452
## 6    6001401700 0.2173742 0.2492138 0.1187107 0.37146226 0.3674977 0.6325023 0.7893491
##   p_movelocal p_movecounty p_movestate p_moveabroad     p_car  p_carpool p_transit     p_bike
## 1  0.10334967   0.05841503 0.010212418  0.013071895 0.5513699 0.06438356 0.1890411 0.08356164
## 2  0.28004022   0.06334842 0.022624434  0.022121669 0.3411535 0.10899316 0.3914956 0.01808407
## 3  0.16373858   0.01780276 0.006324666  0.004450691 0.4708464 0.02131661 0.2557994 0.11661442
## 4  0.10144231   0.05384615 0.003365385  0.000000000 0.5020367 0.09063136 0.2301426 0.04786151
## 5  0.07956989   0.08225806 0.002150538  0.005376344 0.5704809 0.12271973 0.1774461 0.06301824
## 6  0.13806706   0.04575937 0.013412229  0.013412229 0.4088328 0.17917981 0.2353312 0.01324921
##       p_walk
## 1 0.05821918
## 2 0.10459433
## 3 0.10219436
## 4 0.01731161
## 5 0.00000000
## 6 0.01829653

Brief summary of the data:

Below is a table of the variables in this table. They were combined from different ACS 5 year tables.

NOTE: - variables that start with c_ are counts - variables that start with med_ are medians - variables that end in _moe are margin of error estimates - variables that start with _p are proportions calcuated from the counts divided by the table denominator (the total count for whom that variable was assessed)

Variable Description
c_race Total population
c_white Total white non-Latinx
c_black Total black and African American non-Latinx
c_asian Total Asian non-Latinx
c_latinx Total Latinx
state_fips State level FIPS code
county_fips County level FIPS code
tract_fips Tracts level FIPS code
med_rent Median rent
med_hhinc Median household income
c_tenants Total tenants
c_owners Total owners
c_renters Total renters
c_movers Total number of people who moved
c_stay Total number of people who stayed
c_movelocal Number of people who moved locally
c_movecounty Number of people who moved counties
c_movestate Number of people who moved states
c_moveabroad Number of people who moved abroad
c_commute Total number of commuters
c_car Number of commuters who use a car
c_carpool Number of commuters who carpool
c_transit Number of commuters who use public transit
c_bike Number of commuters who bike
c_walk Number of commuters who bike
year ACS data year
FIPS_11_digit 11-digit FIPS code

We’re going to drop all of our moe columns by identifying all of those that end with _moe. We can do that in two steps, first by using filter to identify columns that contain the string _moe.

tidyverse will help with this!

library(tidyverse) 
## ── Attaching packages ───────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.3.2     ✔ purrr   0.3.4
## ✔ tibble  3.0.3     ✔ dplyr   1.0.2
## ✔ tidyr   1.1.1     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
acs5_df = acs5_df %>% select(-contains("_moe"))

Unfortunately, when this dataset reads in, the 11-digit FIPS codes that should be strings actually read in as numerics, and thus the leading 0 gets truncated. We’re going to need those FIPS code in the correct format later, so let’s reformat them now.

# recast the FIPS 11-digit codes as strings, pasting a 0 at the front of each
acs5_df$FIPS_11_digit = paste0('0', acs5_df$FIPS_11_digit)

And lastly, let’s grab only the rows for year 2018 and county FIPS code 1 (i.e. Alameda County)

acs5_df_ac = acs5_df[acs5_df$year==2018 & acs5_df$county_fips==1, ]
head(acs5_df_ac)
##                                                  NAME c_race c_white c_black c_asian c_latinx
## 8324 Census Tract 4415.01, Alameda County, California   6570     677     111    4740      570
## 8325    Census Tract 4047, Alameda County, California   2079    1515     134     199      175
## 8326    Census Tract 4425, Alameda County, California   7748    1430     375    3379     1904
## 8327    Census Tract 4503, Alameda County, California   5301    2597      96    1077     1315
## 8328 Census Tract 4506.07, Alameda County, California   5971    2832     324    1726      804
## 8329    Census Tract 4049, Alameda County, California   3973    2383     394     442      337
##      state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters
## 8324          6           1     441501     1883    152394      1796     1634       162
## 8325          6           1     404700     3250    181000       790      680       110
## 8326          6           1     442500     1999     95323      2264     1245      1019
## 8327          6           1     450300     2626    121131      1814     1249       565
## 8328          6           1     450607     1837     96061      2445      883      1562
## 8329          6           1     404900     1446    111846      1874     1095       779
##      c_movers c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car
## 8324     6491   6010         257           68         129           27      2317  1765
## 8325     2043   1822          58           77          65           21      1075   572
## 8326     7628   6858         557           29          23          161      3330  2382
## 8327     5249   4668         134          326         114            7      2819  2112
## 8328     5905   4735         715          254         201            0      3315  2316
## 8329     3939   3647         123          100          34           35      2211  1030
##      c_carpool c_transit c_bike c_walk year FIPS_11_digit   p_white    p_black   p_asian
## 8324       264       127     28      8 2018   06001441501 0.1030441 0.01689498 0.7214612
## 8325       191       170      7      6 2018   06001404700 0.7287157 0.06445406 0.0957191
## 8326       375       214     59     34 2018   06001442500 0.1845638 0.04839959 0.4361125
## 8327       183       265     28     33 2018   06001450300 0.4899076 0.01810979 0.2031692
## 8328       440       193     71    114 2018   06001450607 0.4742924 0.05426227 0.2890638
## 8329       303       522     48      0 2018   06001404900 0.5997986 0.09916939 0.1112509
##        p_latinx  p_owners  p_renters    p_stay p_movelocal p_movecounty p_movestate
## 8324 0.08675799 0.9097996 0.09020045 0.9258974  0.03959328  0.010476044 0.019873671
## 8325 0.08417508 0.8607595 0.13924051 0.8918257  0.02838962  0.037689672 0.031815957
## 8326 0.24574084 0.5499117 0.45008834 0.8990561  0.07302045  0.003801783 0.003015207
## 8327 0.24806640 0.6885336 0.31146637 0.8893122  0.02552867  0.062107068 0.021718423
## 8328 0.13465081 0.3611452 0.63885481 0.8018628  0.12108383  0.043014395 0.034038950
## 8329 0.08482255 0.5843116 0.41568837 0.9258695  0.03122620  0.025387154 0.008631632
##      p_moveabroad     p_car  p_carpool  p_transit      p_bike      p_walk
## 8324  0.004159606 0.7617609 0.11394044 0.05481226 0.012084592 0.003452741
## 8325  0.010279001 0.5320930 0.17767442 0.15813953 0.006511628 0.005581395
## 8326  0.021106450 0.7153153 0.11261261 0.06426426 0.017717718 0.010210210
## 8327  0.001333587 0.7492018 0.06491664 0.09400497 0.009932600 0.011706279
## 8328  0.000000000 0.6986425 0.13273002 0.05822021 0.021417798 0.034389140
## 8329  0.008885504 0.4658526 0.13704206 0.23609227 0.021709634 0.000000000
Now let’s also read in our census tracts again!
r tracts_sf = st_read("./notebook_data/census/Tracts/cb_2013_06_tract_500k.shp", )
## Reading layer `cb_2013_06_tract_500k' from data source `/home/drew/Desktop/stuff/berk/dlab/Geospatial-Fundamentals-in-R-with-sf/rewrite/notebook_data/census/Tracts/cb_2013_06_tract_500k.shp' using driver `ESRI Shapefile' ## Simple feature collection with 8043 features and 9 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -124.4096 ymin: 32.53416 xmax: -114.1312 ymax: 42.00952 ## geographic CRS: NAD83
r head(tracts_sf)
## Simple feature collection with 6 features and 9 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -122.3049 ymin: 37.76456 xmax: -122.2074 ymax: 37.84851 ## geographic CRS: NAD83 ## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD ALAND AWATER ## 1 06 001 400300 1400000US06001400300 06001400300 4003 CT 1105329 0 ## 2 06 001 400900 1400000US06001400900 06001400900 4009 CT 420877 0 ## 3 06 001 402200 1400000US06001402200 06001402200 4022 CT 712082 0 ## 4 06 001 402800 1400000US06001402800 06001402800 4028 CT 398311 0 ## 5 06 001 404800 1400000US06001404800 06001404800 4048 CT 628405 0 ## 6 06 001 406100 1400000US06001406100 06001406100 4061 CT 1843685 74875 ## geometry ## 1 MULTIPOLYGON (((-122.2642 3... ## 2 MULTIPOLYGON (((-122.2856 3... ## 3 MULTIPOLYGON (((-122.304 37... ## 4 MULTIPOLYGON (((-122.276 37... ## 5 MULTIPOLYGON (((-122.2182 3... ## 6 MULTIPOLYGON (((-122.2387 3...
r tracts_sf_ac = tracts_sf[tracts_sf$COUNTYFP == '001',] plot(tracts_sf_ac$geometry)
# 7.1 Attribute Joins
Attribute Joins between sf data.frames and plain data.frames
We just mapped the census tracts. But what makes a map powerful is when you map the data associated with the locations.
- tracts_sf_ac: These are polygon data in an sf data.frame. However, as we saw in the head of that dataset, they no attributes of interest!
- acs5_df_ac: These are 2018 ACS data from a CSV file (‘census_variables_CA.csv’), imported and read in as a plain data.frame. However, they have no geometries!
In order to map the ACS data we need to associate it with the tracts. Let’s do that now, by joining the columns from acs5_df_ac to the columns of tracts_gdf_ac using a common column as the key for matching rows. This process is called an attribute join.

<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 

Question

The image above gives us a nice conceptual summary of the types of joins we could run.

  1. In general, why might we choose one type of join over another?
  2. In our case, do we want an inner, left, right, or outer (AKA ‘full’) join?

(NOTE: You can read more about merging sf and plain data.frames here.)

Okay, here we go!

Let’s take a look at the common column in both our data.frames.

head(tracts_sf_ac['GEOID'])
## Simple feature collection with 6 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.3049 ymin: 37.76456 xmax: -122.2074 ymax: 37.84851
## geographic CRS: NAD83
##         GEOID                       geometry
## 1 06001400300 MULTIPOLYGON (((-122.2642 3...
## 2 06001400900 MULTIPOLYGON (((-122.2856 3...
## 3 06001402200 MULTIPOLYGON (((-122.304 37...
## 4 06001402800 MULTIPOLYGON (((-122.276 37...
## 5 06001404800 MULTIPOLYGON (((-122.2182 3...
## 6 06001406100 MULTIPOLYGON (((-122.2387 3...
head(acs5_df_ac['FIPS_11_digit'])
##      FIPS_11_digit
## 8324   06001441501
## 8325   06001404700
## 8326   06001442500
## 8327   06001450300
## 8328   06001450607
## 8329   06001404900

Note that they are not named the same thing.

    That's okay! We just need to know that they contain the same information.

Also note that they are not in the same order.

    That's not only okay... That's the point! (If they were in the same order already then we could just join them side by side, without having R find and line up the matching rows from each!)

Let’s do a left join to keep all of the census tracts in Alameda County and only the ACS data for those tracts.

NOTE: To figure out how to do this we could always take a peek at the documentation by calling ?base::merge.

?base::merge
# Left join keeps all tracts and the acs data for those tracts
tracts_acs_sf_ac = base::merge(tracts_sf_ac, acs5_df_ac, by.x = 'GEOID', by.y = "FIPS_11_digit", all.x=TRUE)
head(tracts_acs_sf_ac)
## Simple feature collection with 6 features and 52 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
##         GEOID STATEFP COUNTYFP TRACTCE             AFFGEOID NAME.x LSAD   ALAND AWATER
## 1 06001400100      06      001  400100 1400000US06001400100   4001   CT 6894334      0
## 2 06001400200      06      001  400200 1400000US06001400200   4002   CT  587453      0
## 3 06001400300      06      001  400300 1400000US06001400300   4003   CT 1105329      0
## 4 06001400400      06      001  400400 1400000US06001400400   4004   CT  714729      0
## 5 06001400500      06      001  400500 1400000US06001400500   4005   CT  590307      0
## 6 06001400600      06      001  400600 1400000US06001400600   4006   CT  297856      0
##                                          NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4001, Alameda County, California   3115    2055     128     592      104
## 2 Census Tract 4002, Alameda County, California   2025    1436      59     183      178
## 3 Census Tract 4003, Alameda County, California   5000    3458     380     535      364
## 4 Census Tract 4004, Alameda County, California   3843    2551     229     373      420
## 5 Census Tract 4005, Alameda County, California   3786    1885     990     264      334
## 6 Census Tract 4006, Alameda County, California   1638     817     343     144      124
##   state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1          6           1     400100     3501    200893      1297     1158       139     3084
## 2          6           1     400200     1902    160536       855      532       323     2012
## 3          6           1     400300     1481     94732      2441     1057      1384     4948
## 4          6           1     400400     1624    113036      1750      653      1097     3754
## 5          6           1     400500     1627    103846      1622      605      1017     3745
## 6          6           1     400600     1640    127232       657      283       374     1587
##   c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1   2514         202          120         219           29      1542   864       165
## 2   1827          40           46          39           60      1211   500        40
## 3   4159         223          289         156          121      2975  1252       177
## 4   3440         113          133          46           22      2293   933       240
## 5   3065         309          180         145           46      2325   983       156
## 6   1221         152          146          68            0      1022   370       120
##   c_transit c_bike c_walk year   p_white    p_black    p_asian   p_latinx  p_owners p_renters
## 1       271      0     10 2018 0.6597111 0.04109149 0.19004815 0.03338684 0.8928296 0.1071704
## 2       426     62     57 2018 0.7091358 0.02913580 0.09037037 0.08790123 0.6222222 0.3777778
## 3       835    202    171 2018 0.6916000 0.07600000 0.10700000 0.07280000 0.4330193 0.5669807
## 4       588    188     92 2018 0.6638043 0.05958886 0.09705959 0.10928962 0.3731429 0.6268571
## 5       619    177     94 2018 0.4978870 0.26148970 0.06973059 0.08821976 0.3729963 0.6270037
## 6       268     93      3 2018 0.4987790 0.20940171 0.08791209 0.07570208 0.4307458 0.5692542
##      p_stay p_movelocal p_movecounty p_movestate p_moveabroad     p_car  p_carpool p_transit
## 1 0.8151751  0.06549935   0.03891051  0.07101167  0.009403372 0.5603113 0.10700389 0.1757458
## 2 0.9080517  0.01988072   0.02286282  0.01938370  0.029821074 0.4128819 0.03303055 0.3517754
## 3 0.8405416  0.04506871   0.05840744  0.03152789  0.024454325 0.4208403 0.05949580 0.2806723
## 4 0.9163559  0.03010123   0.03542888  0.01225360  0.005860416 0.4068905 0.10466638 0.2564326
## 5 0.8184246  0.08251001   0.04806409  0.03871829  0.012283044 0.4227957 0.06709677 0.2662366
## 6 0.7693762  0.09577820   0.09199748  0.04284814  0.000000000 0.3620352 0.11741683 0.2622309
##       p_bike      p_walk                       geometry
## 1 0.00000000 0.006485084 MULTIPOLYGON (((-122.2469 3...
## 2 0.05119736 0.047068538 MULTIPOLYGON (((-122.2574 3...
## 3 0.06789916 0.057478992 MULTIPOLYGON (((-122.2642 3...
## 4 0.08198866 0.040122111 MULTIPOLYGON (((-122.2618 3...
## 5 0.07612903 0.040430108 MULTIPOLYGON (((-122.2694 3...
## 6 0.09099804 0.002935421 MULTIPOLYGON (((-122.2681 3...

Let’s check that we have all the variables we have in our dataset now.

colnames(tracts_acs_sf_ac)
##  [1] "GEOID"        "STATEFP"      "COUNTYFP"     "TRACTCE"      "AFFGEOID"     "NAME.x"      
##  [7] "LSAD"         "ALAND"        "AWATER"       "NAME.y"       "c_race"       "c_white"     
## [13] "c_black"      "c_asian"      "c_latinx"     "state_fips"   "county_fips"  "tract_fips"  
## [19] "med_rent"     "med_hhinc"    "c_tenants"    "c_owners"     "c_renters"    "c_movers"    
## [25] "c_stay"       "c_movelocal"  "c_movecounty" "c_movestate"  "c_moveabroad" "c_commute"   
## [31] "c_car"        "c_carpool"    "c_transit"    "c_bike"       "c_walk"       "year"        
## [37] "p_white"      "p_black"      "p_asian"      "p_latinx"     "p_owners"     "p_renters"   
## [43] "p_stay"       "p_movelocal"  "p_movecounty" "p_movestate"  "p_moveabroad" "p_car"       
## [49] "p_carpool"    "p_transit"    "p_bike"       "p_walk"       "geometry"
<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 

Question

It’s always important to run sanity checks on our results, at each step of the way!

In this case, how many rows and columns should we have?

print("Rows and columns in the Alameda County Census tract gdf:")
## [1] "Rows and columns in the Alameda County Census tract gdf:"
print(dim(tracts_sf_ac))
## [1] 361  10
print("Row and columns in the ACS5 2018 data:")
## [1] "Row and columns in the ACS5 2018 data:"
print(dim(acs5_df_ac))
## [1] 361  44
print("Rows and columns in the Alameda County Census tract gdf joined to the ACS data:")
## [1] "Rows and columns in the Alameda County Census tract gdf joined to the ACS data:"
print(dim(tracts_acs_sf_ac))
## [1] 361  53

Let’s save out our merged data so we can use it in the final notebook.

st_write(tracts_acs_sf_ac, './outdata/tracts_acs_gdf_ac.json', driver='GeoJSON', delete_dsn=T)
## Deleting source `./outdata/tracts_acs_gdf_ac.json' using driver `GeoJSON'
## Writing layer `tracts_acs_gdf_ac' to data source `./outdata/tracts_acs_gdf_ac.json' using driver `GeoJSON'
## Writing 361 features with 52 fields and geometry type Multi Polygon.

Exercise: Choropleth Map

We can now make choropleth maps using our attribute joined geodataframe. Go ahead and pick one variable to color the map, then map it using tmap (since it’s too easy using the plot method). You can go back to lesson 5 if you need a refresher on how to make this!

To see the solution, double-click the Markdown cell below.

head(tracts_acs_sf_ac)
## Simple feature collection with 6 features and 52 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
##         GEOID STATEFP COUNTYFP TRACTCE             AFFGEOID NAME.x LSAD   ALAND AWATER
## 1 06001400100      06      001  400100 1400000US06001400100   4001   CT 6894334      0
## 2 06001400200      06      001  400200 1400000US06001400200   4002   CT  587453      0
## 3 06001400300      06      001  400300 1400000US06001400300   4003   CT 1105329      0
## 4 06001400400      06      001  400400 1400000US06001400400   4004   CT  714729      0
## 5 06001400500      06      001  400500 1400000US06001400500   4005   CT  590307      0
## 6 06001400600      06      001  400600 1400000US06001400600   4006   CT  297856      0
##                                          NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4001, Alameda County, California   3115    2055     128     592      104
## 2 Census Tract 4002, Alameda County, California   2025    1436      59     183      178
## 3 Census Tract 4003, Alameda County, California   5000    3458     380     535      364
## 4 Census Tract 4004, Alameda County, California   3843    2551     229     373      420
## 5 Census Tract 4005, Alameda County, California   3786    1885     990     264      334
## 6 Census Tract 4006, Alameda County, California   1638     817     343     144      124
##   state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1          6           1     400100     3501    200893      1297     1158       139     3084
## 2          6           1     400200     1902    160536       855      532       323     2012
## 3          6           1     400300     1481     94732      2441     1057      1384     4948
## 4          6           1     400400     1624    113036      1750      653      1097     3754
## 5          6           1     400500     1627    103846      1622      605      1017     3745
## 6          6           1     400600     1640    127232       657      283       374     1587
##   c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1   2514         202          120         219           29      1542   864       165
## 2   1827          40           46          39           60      1211   500        40
## 3   4159         223          289         156          121      2975  1252       177
## 4   3440         113          133          46           22      2293   933       240
## 5   3065         309          180         145           46      2325   983       156
## 6   1221         152          146          68            0      1022   370       120
##   c_transit c_bike c_walk year   p_white    p_black    p_asian   p_latinx  p_owners p_renters
## 1       271      0     10 2018 0.6597111 0.04109149 0.19004815 0.03338684 0.8928296 0.1071704
## 2       426     62     57 2018 0.7091358 0.02913580 0.09037037 0.08790123 0.6222222 0.3777778
## 3       835    202    171 2018 0.6916000 0.07600000 0.10700000 0.07280000 0.4330193 0.5669807
## 4       588    188     92 2018 0.6638043 0.05958886 0.09705959 0.10928962 0.3731429 0.6268571
## 5       619    177     94 2018 0.4978870 0.26148970 0.06973059 0.08821976 0.3729963 0.6270037
## 6       268     93      3 2018 0.4987790 0.20940171 0.08791209 0.07570208 0.4307458 0.5692542
##      p_stay p_movelocal p_movecounty p_movestate p_moveabroad     p_car  p_carpool p_transit
## 1 0.8151751  0.06549935   0.03891051  0.07101167  0.009403372 0.5603113 0.10700389 0.1757458
## 2 0.9080517  0.01988072   0.02286282  0.01938370  0.029821074 0.4128819 0.03303055 0.3517754
## 3 0.8405416  0.04506871   0.05840744  0.03152789  0.024454325 0.4208403 0.05949580 0.2806723
## 4 0.9163559  0.03010123   0.03542888  0.01225360  0.005860416 0.4068905 0.10466638 0.2564326
## 5 0.8184246  0.08251001   0.04806409  0.03871829  0.012283044 0.4227957 0.06709677 0.2662366
## 6 0.7693762  0.09577820   0.09199748  0.04284814  0.000000000 0.3620352 0.11741683 0.2622309
##       p_bike      p_walk                       geometry
## 1 0.00000000 0.006485084 MULTIPOLYGON (((-122.2469 3...
## 2 0.05119736 0.047068538 MULTIPOLYGON (((-122.2574 3...
## 3 0.06789916 0.057478992 MULTIPOLYGON (((-122.2642 3...
## 4 0.08198866 0.040122111 MULTIPOLYGON (((-122.2618 3...
## 5 0.07612903 0.040430108 MULTIPOLYGON (((-122.2694 3...
## 6 0.09099804 0.002935421 MULTIPOLYGON (((-122.2681 3...
# YOUR CODE HERE

Double-click to see solution!

# 7.2 Spatial Joins
Great! We’ve wrapped our heads around the concept of an attribute join.
Now let’s extend that concept to its spatially explicit equivalent: the spatial join!

To start, we’ll read in some other data: The Alameda County schools data.
Then we’ll work with that data and our tracts_acs_sf_ac data together.
r schools_df = read.csv('notebook_data/alco_schools.csv') schools_sf = st_as_sf(schools_df, coords = c('X', 'Y'), crs=4326)
Let’s check if we have to transform the schools to match thetracts_acs_sf_ac’s CRS.
r print('schools_sf CRS:')
## [1] "schools_sf CRS:"
r print(st_crs(schools_sf))
## Coordinate Reference System: ## User input: EPSG:4326 ## wkt: ## GEOGCRS["WGS 84", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["geodetic latitude (Lat)",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["geodetic longitude (Lon)",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## USAGE[ ## SCOPE["unknown"], ## AREA["World"], ## BBOX[-90,-180,90,180]], ## ID["EPSG",4326]]
r print('tracts_acs_sf_ac CRS:')
## [1] "tracts_acs_sf_ac CRS:"
r print(st_crs(tracts_acs_sf_ac))
## Coordinate Reference System: ## User input: NAD83 ## wkt: ## GEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]]
Yes we do! Let’s do that.
NOTE: Explicit syntax aiming at that dataset’s CRS leaves less room for human error!
```r schools_sf = st_transform(schools_sf, st_crs(tracts_acs_sf_ac))
print(‘schools_sf CRS:’) ```
## [1] "schools_sf CRS:"
r print(st_crs(schools_sf))
## Coordinate Reference System: ## User input: NAD83 ## wkt: ## GEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]]
r print('tracts_acs_sf_ac CRS:')
## [1] "tracts_acs_sf_ac CRS:"
r print(st_crs(tracts_acs_sf_ac))
## Coordinate Reference System: ## User input: NAD83 ## wkt: ## GEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]]
Now we’re ready to combine the datasets in an analysis.
In this case, we want to get data from the census tract within which each school is located.
But how can we do that? The two datasets don’t share a common column to use for a join.
r colnames(tracts_acs_sf_ac)
## [1] "GEOID" "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" "NAME.x" ## [7] "LSAD" "ALAND" "AWATER" "NAME.y" "c_race" "c_white" ## [13] "c_black" "c_asian" "c_latinx" "state_fips" "county_fips" "tract_fips" ## [19] "med_rent" "med_hhinc" "c_tenants" "c_owners" "c_renters" "c_movers" ## [25] "c_stay" "c_movelocal" "c_movecounty" "c_movestate" "c_moveabroad" "c_commute" ## [31] "c_car" "c_carpool" "c_transit" "c_bike" "c_walk" "year" ## [37] "p_white" "p_black" "p_asian" "p_latinx" "p_owners" "p_renters" ## [43] "p_stay" "p_movelocal" "p_movecounty" "p_movestate" "p_moveabroad" "p_car" ## [49] "p_carpool" "p_transit" "p_bike" "p_walk" "geometry"
r colnames(schools_sf)
## [1] "Site" "Address" "City" "State" "Type" "API" "Org" "geometry"
However, they do have a shared relationship by way of space!
So, we’ll use a spatial relationship query to figure out the census tract that each school is in, then associate the tract’s data with that school (as additional data in the school’s row). This is a spatial join!

Census Tract Data Associated with Each School

In this case, let’s say we’re interested in the relationship between the median household income in a census tract (tracts_acs_sf_ac$med_hhinc) and a school’s Academic Performance Index (schools_gdf$API).

To start, let’s take a look at the distributions of our two variables of interest.

head(tracts_acs_sf_ac)
## Simple feature collection with 6 features and 52 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
##         GEOID STATEFP COUNTYFP TRACTCE             AFFGEOID NAME.x LSAD   ALAND AWATER
## 1 06001400100      06      001  400100 1400000US06001400100   4001   CT 6894334      0
## 2 06001400200      06      001  400200 1400000US06001400200   4002   CT  587453      0
## 3 06001400300      06      001  400300 1400000US06001400300   4003   CT 1105329      0
## 4 06001400400      06      001  400400 1400000US06001400400   4004   CT  714729      0
## 5 06001400500      06      001  400500 1400000US06001400500   4005   CT  590307      0
## 6 06001400600      06      001  400600 1400000US06001400600   4006   CT  297856      0
##                                          NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4001, Alameda County, California   3115    2055     128     592      104
## 2 Census Tract 4002, Alameda County, California   2025    1436      59     183      178
## 3 Census Tract 4003, Alameda County, California   5000    3458     380     535      364
## 4 Census Tract 4004, Alameda County, California   3843    2551     229     373      420
## 5 Census Tract 4005, Alameda County, California   3786    1885     990     264      334
## 6 Census Tract 4006, Alameda County, California   1638     817     343     144      124
##   state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1          6           1     400100     3501    200893      1297     1158       139     3084
## 2          6           1     400200     1902    160536       855      532       323     2012
## 3          6           1     400300     1481     94732      2441     1057      1384     4948
## 4          6           1     400400     1624    113036      1750      653      1097     3754
## 5          6           1     400500     1627    103846      1622      605      1017     3745
## 6          6           1     400600     1640    127232       657      283       374     1587
##   c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1   2514         202          120         219           29      1542   864       165
## 2   1827          40           46          39           60      1211   500        40
## 3   4159         223          289         156          121      2975  1252       177
## 4   3440         113          133          46           22      2293   933       240
## 5   3065         309          180         145           46      2325   983       156
## 6   1221         152          146          68            0      1022   370       120
##   c_transit c_bike c_walk year   p_white    p_black    p_asian   p_latinx  p_owners p_renters
## 1       271      0     10 2018 0.6597111 0.04109149 0.19004815 0.03338684 0.8928296 0.1071704
## 2       426     62     57 2018 0.7091358 0.02913580 0.09037037 0.08790123 0.6222222 0.3777778
## 3       835    202    171 2018 0.6916000 0.07600000 0.10700000 0.07280000 0.4330193 0.5669807
## 4       588    188     92 2018 0.6638043 0.05958886 0.09705959 0.10928962 0.3731429 0.6268571
## 5       619    177     94 2018 0.4978870 0.26148970 0.06973059 0.08821976 0.3729963 0.6270037
## 6       268     93      3 2018 0.4987790 0.20940171 0.08791209 0.07570208 0.4307458 0.5692542
##      p_stay p_movelocal p_movecounty p_movestate p_moveabroad     p_car  p_carpool p_transit
## 1 0.8151751  0.06549935   0.03891051  0.07101167  0.009403372 0.5603113 0.10700389 0.1757458
## 2 0.9080517  0.01988072   0.02286282  0.01938370  0.029821074 0.4128819 0.03303055 0.3517754
## 3 0.8405416  0.04506871   0.05840744  0.03152789  0.024454325 0.4208403 0.05949580 0.2806723
## 4 0.9163559  0.03010123   0.03542888  0.01225360  0.005860416 0.4068905 0.10466638 0.2564326
## 5 0.8184246  0.08251001   0.04806409  0.03871829  0.012283044 0.4227957 0.06709677 0.2662366
## 6 0.7693762  0.09577820   0.09199748  0.04284814  0.000000000 0.3620352 0.11741683 0.2622309
##       p_bike      p_walk                       geometry
## 1 0.00000000 0.006485084 MULTIPOLYGON (((-122.2469 3...
## 2 0.05119736 0.047068538 MULTIPOLYGON (((-122.2574 3...
## 3 0.06789916 0.057478992 MULTIPOLYGON (((-122.2642 3...
## 4 0.08198866 0.040122111 MULTIPOLYGON (((-122.2618 3...
## 5 0.07612903 0.040430108 MULTIPOLYGON (((-122.2694 3...
## 6 0.09099804 0.002935421 MULTIPOLYGON (((-122.2681 3...
hist(tracts_acs_sf_ac$med_hhinc)

hist(schools_sf$API)

Oh, right! Those pesky schools with no reported APIs (i.e. API == 0)! Let’s drop those.

schools_sf_api = schools_sf[schools_sf$API > 0, ]
hist(schools_sf_api$API)

Much better!

Now, maybe we think there ought to be some correlation between the two variables? As a first pass at this possibility, let’s overlay the two datasets, coloring each one by its variable of interest. This should give us a sense of whether or not similar values co-occur.

tm_shape(tracts_acs_sf_ac) + 
  tm_polygons(col = 'med_hhinc',
             palette = 'RdPu') + 
tm_shape(schools_sf_api) + 
  tm_dots(col = 'API',
          palette = 'RdPu',
          size = 0.15)

Spatially Joining our Schools and Census Tracts

Though it’s hard to say for sure, it certainly looks possible. It would be ideal to scatterplot the variables! But in order to do that, we need to know the median household income in each school’s tract, which means we definitely need our spatial join!

We’ll first take a look at the documentation for the spatial join function, st_join.

?st_join

Looks like the key arguments to consider are: - the two sf data.frames (x and y) - the type of join to run (left), which is a left join if TRUE, or an inner join if FALSE - the spatial relationship query to use (join)

NOTE: - By default st_join is a left join, because left defaults to TRUE.

  • By default st_join maintains the geometries of the first sf data.frame input to the operation (i.e. the geometries of x).
<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 

Questions

  1. Which sf data.frame are we joining onto which (i.e. which one is getting the other one’s data added to it)?
  2. What happened to ‘outer’ as a join type?
  3. Thus, in our operation, which sf data.frame should be x, which should be y, and should left be TRUE or FALSE?

Alright! Let’s run our join!

schools_jointracts = st_join(schools_sf_api, tracts_acs_sf_ac, left=T, join=st_within)
## although coordinates are longitude/latitude, st_within assumes that they are planar

Checking Our Output


<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 

Questions

As always, we want to sanity-check our intermediate result before we rush ahead.

One way to do that is to introspect the structure of the result object a bit.

  1. What type of object should that have given us?
  2. What should the dimensions of that object be, and why?
  3. If we wanted a visual check of our results (i.e. a plot or map), what could we do?
print(dim(schools_jointracts))
## [1] 325  60
print(dim(schools_sf))
## [1] 550   8
print(dim(tracts_acs_sf_ac))
## [1] 361  53
head(schools_jointracts)
## Simple feature collection with 6 features and 59 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.2616 ymin: 37.739 xmax: -122.2348 ymax: 37.76911
## geographic CRS: NAD83
##                        Site               Address    City State Type API    Org       GEOID
## 1 Amelia Earhart Elementary 400 Packet Landing Rd Alameda    CA   ES 933 Public 06001428302
## 2       Bay Farm Elementary   200 Aughinbaugh Way Alameda    CA   ES 932 Public 06001428302
## 3  Donald D. Lum Elementary    1801 Sandcreek Way Alameda    CA   ES 853 Public 06001428500
## 4         Edison Elementary  2700 Buena Vista Ave Alameda    CA   ES 927 Public 06001427100
## 5     Frank Otis Elementary      3010 Fillmore St Alameda    CA   ES 894 Public 06001428200
## 6       Franklin Elementary  1433 San Antonio Ave Alameda    CA   ES 893 Public 06001427900
##   STATEFP COUNTYFP TRACTCE             AFFGEOID  NAME.x LSAD   ALAND AWATER
## 1      06      001  428302 1400000US06001428302 4283.02   CT 2234427 474770
## 2      06      001  428302 1400000US06001428302 4283.02   CT 2234427 474770
## 3      06      001  428500 1400000US06001428500    4285   CT  624197 383522
## 4      06      001  427100 1400000US06001427100    4271   CT 1025446 168187
## 5      06      001  428200 1400000US06001428200    4282   CT 1366564 800242
## 6      06      001  427900 1400000US06001427900    4279   CT  825214      0
##                                             NAME.y c_race c_white c_black c_asian c_latinx
## 1 Census Tract 4283.02, Alameda County, California   7501    3114     273    3070      562
## 2 Census Tract 4283.02, Alameda County, California   7501    3114     273    3070      562
## 3    Census Tract 4285, Alameda County, California   3152    1293     268     940      378
## 4    Census Tract 4271, Alameda County, California   3768    2446      67     574      457
## 5    Census Tract 4282, Alameda County, California   6643    3468     387    1589      686
## 6    Census Tract 4279, Alameda County, California   5031    2554     228    1407      563
##   state_fips county_fips tract_fips med_rent med_hhinc c_tenants c_owners c_renters c_movers
## 1          6           1     428302     1946    147667      2540     2015       525     7436
## 2          6           1     428302     1946    147667      2540     2015       525     7436
## 3          6           1     428500     1873     88333      1422      485       937     3125
## 4          6           1     427100     1767    135833      1420     1040       380     3724
## 5          6           1     428200     1855    103781      2641     1468      1173     6587
## 6          6           1     427900     1712    102077      2030      734      1296     4971
##   c_stay c_movelocal c_movecounty c_movestate c_moveabroad c_commute c_car c_carpool
## 1   6705         395           99         175           62      3812  2595       296
## 2   6705         395           99         175           62      3812  2595       296
## 3   2641         282          102         100            0      1514   910        65
## 4   3498         121           86           0           19      1755   986       136
## 5   6155         205           71         141           15      3391  2189       229
## 6   4470         108          244         130           19      3102  1705       380
##   c_transit c_bike c_walk year   p_white    p_black   p_asian   p_latinx  p_owners p_renters
## 1       409     18     73 2018 0.4151446 0.03639515 0.4092788 0.07492334 0.7933071 0.2066929
## 2       409     18     73 2018 0.4151446 0.03639515 0.4092788 0.07492334 0.7933071 0.2066929
## 3       374     50     18 2018 0.4102157 0.08502538 0.2982234 0.11992386 0.3410689 0.6589311
## 4       303     33     64 2018 0.6491507 0.01778132 0.1523355 0.12128450 0.7323944 0.2676056
## 5       510     51    108 2018 0.5220533 0.05825681 0.2391992 0.10326660 0.5558501 0.4441499
## 6       624     41     52 2018 0.5076526 0.04531902 0.2796661 0.11190618 0.3615764 0.6384236
##      p_stay p_movelocal p_movecounty p_movestate p_moveabroad     p_car  p_carpool p_transit
## 1 0.9016945  0.05311996   0.01331361  0.02353416  0.008337816 0.6807450 0.07764953 0.1072928
## 2 0.9016945  0.05311996   0.01331361  0.02353416  0.008337816 0.6807450 0.07764953 0.1072928
## 3 0.8451200  0.09024000   0.03264000  0.03200000  0.000000000 0.6010568 0.04293263 0.2470277
## 4 0.9393126  0.03249194   0.02309345  0.00000000  0.005102041 0.5618234 0.07749288 0.1726496
## 5 0.9344163  0.03112191   0.01077881  0.02140580  0.002277213 0.6455323 0.06753170 0.1503981
## 6 0.8992154  0.02172601   0.04908469  0.02615168  0.003822169 0.5496454 0.12250161 0.2011605
##        p_bike     p_walk                   geometry
## 1 0.004721931 0.01915005 POINT (-122.2388 37.74476)
## 2 0.004721931 0.01915005   POINT (-122.2519 37.739)
## 3 0.033025099 0.01188904 POINT (-122.2589 37.76206)
## 4 0.018803419 0.03646724 POINT (-122.2348 37.76525)
## 5 0.015039811 0.03184901 POINT (-122.2381 37.75396)
## 6 0.013217279 0.01676338 POINT (-122.2616 37.76911)

Confirmed! The output of the our st_join operation is an sf data.frame (schools_jointracts) with: - a row for each school that is located inside a census tract (all of them are) - the point geometry of that school - all of the attribute data columns (non-geometry columns) from both input sf data.frames


Let’s also take a look at an overlay map of the schools on the tracts. If we color the schools categorically by their tracts IDs, then we should see that all schools within a given tract polygon are the same color.

tm_shape(tracts_acs_sf_ac) + 
  tm_polygons(col='white', border.col='black') + 
tm_shape(schools_jointracts) + 
  tm_dots(col='GEOID', size=0.2)
## Warning: Number of levels of the variable "GEOID" is 208, which is larger than max.categories
## (which is 30), so levels are combined. Set tmap_options(max.categories = 208) in the layer
## function to show all levels.

Assessing the Relationship between Median Household Income and API

Fantastic! That looks right!

Now we can create that scatterplot we were thinking about!

plot(schools_jointracts$med_hhinc, schools_jointracts$API,
     xlab = 'median household income ($)',
     ylab = 'API')

Wow! Just as we suspected based on our overlay map, there’s a pretty obvious, strong, and positive correlation between median household income in a school’s tract and the school’s API.

7.3: Aggregation

We just saw that a spatial join in one way to leverage the spatial relationship between two datasets in order to create a new, synthetic dataset.

An aggregation is another way we can generate new data from this relationship. In this case, for each feature in one dataset we find all the features in another dataset that satisfy our chosen spatial relationship query with it (e.g. within, intersects), then aggregate them using some summary function (e.g. count, mean).


Getting the Aggregated School Counts

Let’s take this for a spin with our data. We’ll count all the schools within each census tract.

We could do this using an aspatial group-by operation on the GEOID column of the new, spatially joined dataset that we just made. However, since we’re in a geospatial workshop let’s use a spatial aggregation instead!

(Also, to get the correct count, lets use all our schools, not just those with APIs > 0.)

schools_for_count = schools_sf['geometry']
schools_for_count$count = 1
schools_countsbytract = sf:::aggregate.sf(x=schools_for_count, by=tracts_acs_sf_ac, FUN=sum)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar

Let’s see what we got out.

print("Counts, rows and columns:")
## [1] "Counts, rows and columns:"
print(dim(schools_countsbytract))
## [1] 361   2
print("Tracts, rows and columns:")
## [1] "Tracts, rows and columns:"
print(dim(tracts_acs_sf_ac))
## [1] 361  53
# take a look at the data
head(schools_countsbytract)
## Simple feature collection with 6 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.2694 ymin: 37.83454 xmax: -122.2124 ymax: 37.88544
## geographic CRS: NAD83
##   count                       geometry
## 1     1 MULTIPOLYGON (((-122.2469 3...
## 2     1 MULTIPOLYGON (((-122.2574 3...
## 3    NA MULTIPOLYGON (((-122.2642 3...
## 4     2 MULTIPOLYGON (((-122.2618 3...
## 5     1 MULTIPOLYGON (((-122.2694 3...
## 6    NA MULTIPOLYGON (((-122.2681 3...
<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 

Questions

  1. Above we selected the geometry column, added a column of 1s, then aggregated. Why?
  2. What explains the dimensions of the new object (361, 2)?

Mapping our Spatial Join Output

As a sanity-check, we can now map the school counts for all census tracts.

tm_shape(schools_countsbytract) + 
  tm_polygons(col='count') + 
tm_shape(schools_sf) + 
  tm_dots(col='black', alpha=0.75, size=0.1)

Exercise: Aggregation

What is the mean API of each census tract?

As we mentioned, the spatial aggregation workflow that we just put together above could have been used not to generate a new count variable, but also to generate any other new variable the results from calling an aggregation function on an attribute column.

In this case, we want to calculate and map the mean API of the schools in each census tract.

Copy and paste code from above where useful, then tweak and/or add to that code such that your new code: 1. joins the schools onto the tracts (HINT: make sure to decide whether or not you want to include schools with API = 0!) 1. dissolves that joined object by the tract IDs, giving you a new GeoDataFrame with each tract’s mean API (HINT: because this is now a different calculation, different problems may arise and need handling!) 1. plots the tracts, colored by API scores (HINT: overlay the schools points again, visualizing them in a way that will help you visually check your results!)

To see the solution, double-click the Markdown cell below.

# YOUR CODE HERE:

Double-click to see solution!


7.4 Recap

We discussed how we can combine datasets to enhance any geospatial data analyses you could do. Key concepts include: - Attribute joins - merge() - Spatial joins (order matters!) - st_join - Aggregation - aggregate.sf


<div style="font-size:larger">&nbsp;D-Lab @ University of California - Berkeley</div>
<div>&nbsp;Team Geo<div>